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Abstract 

The unusual incidence patterns for nasopharyngeal carcinoma (NPC) in China, Northeast India, Arctic 
Inuit, Peninsular and island Southeast Asia, Polynesian Islanders, and North Africans indicate a role for 
NPC risk genes in Chinese, Chinese-related, and not-obviously Chinese-related populations. Renewed 
interest in NPC genetic risk has been stimulated by a hypothesis that NPC population patterns originated 
in Bai-Yue / pre-Austronesian -speaking aborigines and were dispersed during the last glacial maximum 
by Sundaland submersion. Five articles in this issue of the Chinese Journal of Cancer, first presented at a 
meeting on genetic aspects of NPC [National Cancer Center of Singapore (NCCS), February 20-21, 
2010], are directed towards incidence patterns, to early detection of affected individuals within risk 
populations, and to the application of genetic technology advances to understanding the nature of high 
risk. Turnbull presents a general framework for understanding population migrations that underlie NPC 
and similar complex diseases, including other viral cancers. Trejaut ef al. apply genetic markers to detail 
migration from East Asia through Taiwan to the populating of Island Polynesia. Migration dispersal in a 
westward direction took mongoloid peoples to modern day Northeast India adjacent to Western China 
(Xinjiang). NPC incidence in mongoloid Nagas ranks amongst the highest in the world, whereas 
elsewhere in India NPC is uncommon. Cao et al. detail incidence patterns in Southeast China that have 
occurred over recent decades. Finally, Ji et al. describe the utility of Epstein-Barr virus serostatus in 
early NPC detection. While genetic risk factors still remain largely unknown, human leukocyte antigen 
(HLA) genes have been a focus of attention since the discovery of an HLA association with NPC in 1973 
and, two years later, that NPC susceptibility in highest-risk Cantonese involved the co-occurrence of multi- 
HLA locus combinations of HLA genes as chromosome combinations, or haplotypes (e.g. HLA-A2-B46), 
whereas in relatively lower-risk non-Cantonese Chinese (Hokkiens, Teochews) they appeared to act 
independently, a strength of association reflecting the 30-50-fold difference in incidence between highest 
risk Cantonese and lowest-risk Indians. The prototypic haplotype HLA-A2-B46 extends over megabases. 
An upstream DNA segment (near HLA-DPA1), has close similarity to Gorilla, with no obvious homology to 
Chimpanzee in current databases, suggesting that a reticulate model of primate evolution may be more 
appropriate than simple phylogeny. The DNA variation level in this segment is high enough for it to be a 
hominin remnant. HLA-B46 arose in mongoloids and remains largely limited to Chinese so the question 
arises as to whether the hominin candidate segment indicates an eastward trek of Homo neanderthalensis 
or the survival of much earlier Homo erectus'? In 2011 sequencing technologies have finally caught up 
with the requirement to separate parental haplotypes. Recently achieved chromosome separation for 
whole genome di-haploid genetic and epigenetic analysis of parental inheritance in single individuals will 
reveal interacting patterns of multi-locus haplotypes as humans move in and through successive 
environments, thus providing definitive information on the genetic affinities between extant populations, 
and of the migrations that have led to the global distribution of modern Homo. The challenge can now be 
met of seeking HLA-associated locations both within and outside the HLA complex on each of the pair of 

chromosomes. More broadly, for every disease. 
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Genetic aspects of nasopharyngeal carcinoma (NPC) 
were the focus of a meeting at the National Cancer 
Center of Singapore (NCCS) on February 20-21 , 2010. 
Manuscripts arising from presentations at that meeting 
will be published in the Chinese Journal of Cancer 
(CJC). Readers of a cancer journal may be surprised 
that a sociologist of science receives a place of 
prominence in the five papers published this month. The 
reason is that population incidence is a central enigma of 
all cancer occurrences. NPC has a highly unusual 
global geographic distribution, and thus provides a 
particular opportunity for hypothesis generation and 
testing. A hypothesis proposed by Wee at a/.'^' prompted 
the Singapore meeting and has injected renewed 
enthusiasm into NPC genetic risk inquiry. This unitary 
NPC origin hypothesis proposes that the population 
pattern of NPC occurrence can be explained by a 
genetic lesion originating in Bai-Yue-speaking ("proto-Tai 
Kadai" or "pre-Austronesian" or "proto-Zhuang") 
aborigines that were dispersed by Sundaland 
submersion following the last glacial maximum'^'. 
Dispersal from the putative founding population occurred 
in all directions: Northeast for the Arctic Inuit, Southeast 
for more proximal Land Dyak Bidayuh-speakers of East 
Malaysia and further distant Maori and Polynesian 
Islanders, West to the Northeast of India, and South to 
island Southeast Asia. The articles in this edition of the 
CJC are directed towards incidence patterns, the 
application of genetic technologies to understanding the 
advent of NPC high risk populations, and to early 
detection of affected individuals within these populations. 

The relevance of Turnbull's article'^' to NPC is that 
understanding the bizarre global geographic incidence 
patterns, including east and north African populations, 
requires the unravelling of the diaspora of populations 
that exhibit high NPC risk through tracing the differing 
paths of artefacts, language and limited genetic 
biomarkers that currently lead to conflicting stories. 
TurnbullP' presents a general framework for considering 
human population migrations which is central to testing 
the Wee unitary NPC origin hypothesis and exhorts us to 
remember that understanding human movement involves 
more than just the usual archaeological, cultural-linguistic 
and genetic processes. He reminds us of the social 
dimensions, and argues that the totality of interacting 
components can be conceived of either as an emergent 
complex adaptive system in action, or as being unifiable 
in a grand synthesis; and that such conflicting 
approaches should be held in tension with one another. 
Indeed, there is good reason to consider social practices 
since the Bai-Yue aboriginal origination hypothesis arose 
as a result of Wee's sociological insight that several 
high NPC incidence populations all practised a similar 
form of bamboo pole dance (tinikling). As Wee ef al. 
detailed, common ancestry between the Bai-Yue and 



other high NPC risk aboriginal peoples (of Borneo, 
Northeast India, Arctic Inuit, Austronesian 
Malayo-Polynesians of Southeast Asia, and Polynesians 
of Oceania) is supported by other shared cultural 
characteristics. Whether increased NPC occurrences 
among populations in east and north Africa that are not 
obviously related to Chinese share the expected 
genome-wide risk pattern variations will be a definitive 
test of any global unifying genetic hypothesis. Wee's 
preferred mechanism has the female as the more 
important bearer of transmission, noting that there 
appears to be a step-wise reduction in age-standardised 
rate (ASR) with every migration and intermixing of high 
risk and low risk populations as a "genetic dilution" 
(personal communication, 2010). He clarifies his 
hypothesis as "2 hits" involving an X-linked recessive 
mutation as the "1st hit", and Human leucocyte antigen 
(HLA) immunity as the "2nd hit". He identifies 
involvement of the X-chromosome with Epstein-Barr 
virus (EBV) infections, citing Systemic Lupus 
Erythematosus and X-linked lymphoproliferative disorder. 
Wee suggests that, while the EBV infection in the "1st 
hit" is innocuous, the inability to mount an HLA-based 
effective immune response would result in an 
uncontrollable proliferation of EBV-infected cells, initiating 
the carcinogenic cascade that results in NPC. There are 
probably multiple pathways that allow this step to occur, 
accounting for the different HLA haplotypes in different 
patients (personal communication, 2010). 

In this CJC edition Trejaut ef al. apply genetic 
markers to detail migration from East Asia through 
Taiwan resulting in the populating of Island Polynesia. 
This is followed by NPC incidence descriptions of the 
Northeast Indian provinces adjacent to Western China 
(Xinjiang) The incidence of NPC in the mongoloid 
Nagas is amongst the highest in the world, contrasting 
starkly with that of other populations in Assam and 
elsewhere in India where NPC is rarely seen, with an 
incidence level even lower than that in Caucasians 
(0.5/100 000-2.0/100 000 per year)!^'. Among the Land 
Dyaks in Sarawak, East Malaysia, occurrence is similar 
to, if not higher than, that in the Nagas'^'. Cao at al.^ 
detail incidence patterns in south eastern China, and 
changes that have occurred over recent decades. This 
review should be read in conjunction with other articles 
on incidence changes indicating that environmental and 
lifestyle changes play an important role in the declining 
incidence of NPC over time in some populations'^'. 
Finally, Ji ef a/. describe the utility of EBV serostatus 
in early NPC detection. 

From the viewpoint of NPC genetics, there is now 37 
years of evidence that genetic elements associated with 
HLA within the major histocompatibility complex (MHC) 
are major contributors to differential NPC risk among 
southern Chinese'™'. As early as 1975 it was established 
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that NPC susceptibility in highest risk Cantonese 
involved the co-occurrence of multi-HLA locus 
combinations of HLA genes as chromosome 
combinations, or haplotypes (e.g. HLA-A2-B46), 
whereas in relatively lower hsk non-Cantonese Chinese 
(Hokkiens, Teochews) they appeared to act 
independently, a strength of association reflecting the 
30-50-fold difference in incidence between highest risk 
Cantonese and lowest-risk Indians'^^'. 

HLA genetics in NPC was recently reviewed as a 
commentary to the Bai-Yue hypothesis Within the 
MHC, the HLA genes are of dominant importance in 
immuno-inflammatory biogenetics. However, it is 
essential to remember that the known functions of other 
MHC genes concern fundamental cellular processes, 
whereas that of the majority of MHC genes remain to be 
revealed. Yes, NPC does occur before 30 years of age 
but it is predominantly a disease of older age so the 
roles of other reproductively transmitted genes have to 
be considered. For instance, a recent genome-wide 
association study (GWAS) identified a further three new 
susceptibility genes''^'. This large GWAS, comprising 
approximately 5 000 patients and 5 000 controls of southern 
Chinese descent, has established beyond any doubt that 
the HLA complex is a primary location of NPC risk, 
comprising multiple risk regions, with the "top single 
nucleotide polymorphism (SNP)" having amongst the 
highest statistically significant value of any published 
GWA study. 

The challenge now is to identify location(s) both 
within and outside the HLA complex that underlie such 
genetic associations, and to determine whether they are 
required to be present on both of the pair of 
chromosomes as simple or compound obligate recessive 
traits'"', or whether a single, dominant, gene lesion dose 
is sufficient. Recent studies have confirmed the original 
report that HLA genetic involvement in NPC concerns 
multilocus haplotypes'"', and requires the characterisation 
both of extended haplotypes and of intra-haplotypic 
relations between primary locus alleles '^^^'''. Further 
analysis of the GWAS SNP data patterns is revealing 
the utility of age of onset-cohort SNP stratification for 
detection of fine genome-mapped clusters of interval 
SNPs having highly significant associations with 
unexpected genomic areas (unpublished observations by 
Simons MJ, Bei JX, Cui Q, Lei JJ, Satterley K, Tait BD, 
and Zeng YX, 2011). 

The HLA connection with NPC has an additional 
intrigue. Among the thousands of HLA gene varieties or 
alleles, one, named Singapore-2 when it was first 
discovered '^^', later assigned as HLA-B*46:01, arose in 
Chinese aborigines by a rare mechanism '^^' some tens of 
thousands of years ago and has an intimate but 
unresolved association not only with NPC'^^^°' but also 
with some autoimmune diseases'^^^. HLA-B*46:01 allele 



is inherited with a range of alleles at HLA loci on either 
side of the HLA-B locus as extended haplotypes. 
Analysis of the type and frequency of these multi-locus 
extended haplotypes provides an indication of genetic 
affinities of earner populations (unpublished observations 
by Yuliwulandari R, Simons MJ, and Tokunaga K, 2011). 
However, there are inherent errors in assignment of 
haplotypes based on linkage disequilibrium estimation. 
For instance, a pedigree study of Chinese Han families 
revealed that 65% (235) of 362 three-locus haplotypes 
were observed only once ("singletons") '^^'. This 
corresponds to 45% of individuals having two singleton 
haplotypes, a situation which precludes di-haplotype 
assignment by any likelihood estimation. It is thus 
mandatory to utilise pedigrees or haploid DNA for 
accurate di-haplotype assignment in population affinity 
studies. 

The two prototypic haplotypes A2-B46-DR9 and 
A33-B58-DR3 are known to confer risk for NPC but 
for different ages of onset '^*'. While both extend over 
megabases (unpublished data by Shen M, Chia JM, 
Chan SH, and Ren EC), it is unclear how far 
centromeric in the HLA complex the primary haplotypes, 
or variants thereof, extend. In seeking to characterize 
any association with the main centromeric loci, 
HLA-DPA1 / HLA-DPB1, the heterodimeric combination of 
HLA-D PA 1*04:01 / DPB1*13:01 was found to be a common 
accompaniment of HLA-B*46:01'^. At least three groups 
have sequenced intron and intergenic components of the 
DPA1*04:01 allele and observed a high level of 
sequence variation'^^' (the third is by Wood JM, Simons 
MJ, and Ashdown ML, 2004 in GenBank: nucleotide). 
Over a length of at least 8 kb, the HLA-DPA1*04:01 
sequence has more SNPs than are present in the sum 
of the remaining 27 HLA-DPA1 alleles and as such is 
possibly unique among human patterns (unpublished 
observations by Simons MJ and Varney MD, 2011). 
This segment recombines at sites including junctions 
with repeat sequences as befitting a Mendelian unit of 
genetic inheritance. It has close sequence similarity to 
Gorilla, with no obvious homology to Chimpanzee in 
current databases. Together these two observations 
suggest that a reticulate model of primate evolution may 
be more appropriate to represent genomic segmental 
inheritance than simple phylogeny. As an ancient highly 
polymorphic sequence, an association with earlier 
hominins needs to be considered. Recent evidence of 
genetic mixture between Neanderthals and modern 
humans was interpreted as favouring gene flow from 
Neanderthals into modern humans when they first left 
sub-Saharan Africa because Neanderthals were found to 
be equally distantly related to all non-Africans 
However, the HLA-DPA1 sequence shows sufficiently 
high divergence to allow the possibility of later 
interbreeding with modern humans in western Asia ^^^K 
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Explanation will have to take account of tlie fact that the 
sequence is concentrated in southeast Asia'^''', including 
in modern day Dai speakers that have descended from 
the Bai-Yue, so there are questions concerning hominid 
migration, and whether the sequence represents the 
eastward trek of Homo neanderthalensis or even the 
survival of much earlier hominin, Homo erectus. 

Turnbull also alerts us to the need to recognise 
competing hypotheses at many levels, not only 
concerning NPC risk population migrations during which 
recombination and other rearrangement mechanisms 
result in separation of causative elements and marker 
traits, as reflected in loss of Chinese-associated HLA 
markers in descendant populations, but also at the level 
of molecular mechanisms. For instance, in addition to 
genomic genetics as factors in NPC causation, there are 
two other main candidate categories: (1) infection with, 
and altered immune responsiveness to, EBV'^*^^'; and (2) 
deleterious dietary substances and practices'^^'^'. 

EBV infection is especially important, at three levels. 
The first level is the utility of EBV infection response in 
identifying individuals at high NPC risk'^l The contribution 
is not in itself sufficient to achieve cost-effective, 
clinically useful value, even among highest NPC risk 
normal family members of multiple case families'*", let 
alone in the general population, since only a small 
proportion of infected individuals present with 
EBV-specific IgA seropositivity, but it does provide a 
majority biomarker contribution to early detection™. A 
recent publication concerning host homologous 
recombination repair (HRR) system participation in EBV 
lytic replication suggested a potential mechanism to 
influence EBV reactivation status and thus seropositivity'^. 
Variant alleles of six HRR system-affecting genes could 
well supplement EBV-specific IgA seropositivity towards 
"gap closure" of early NPC detection clinical utility. 

Such an EBV seroimmunity status contribution could 
be further supplemented towards "gap closure" by 
risk-conferring HLA alleles detected, not by routine 
sequence based typing, but by microarray chip-bearing 
tag SNPs. This could be achieved by utilising a risk 
score concept that selects a range of HLA alleles from 
alleles rare or absent in NPC (such as HLA-A*31:01 in 
Chinese, HLA-A*23:01 in Tunisians, HLA-B*44:03:2 in 
Thais) at the one extreme, to the components of high 
risk haplotypes at the other, in a manner similar to celiac 
disease risk identification'^^''. 

The second level of EBV consideration is whether 
the HLA associations involve EBV peptide presentation 
or other direct involvement of HLA alleles as cell 
surface-presenting immune receptors/ligands, or whether 
the connection is indirect, reflecting linked genetic 
lesions referred to as disease-association (DA) or 
disease-susceptibility (DS) loci'^^^'. 

The third level for consideration is the role of EBV in 



cancer genesis. A striking instance of the complexity is 
the association of EBV with both NPC and salivary 
adenocarcinoma in the Inuit. Among multiple NPC case 
Inuit families, increased cancer proneness is to both of 
the EBV-associated tumors '^^', whereas in Chinese only 
NPC is observed '*". Furthermore, EBV seroreactivity 
has a different pattern from Chinese in that the very high 
prevalence of anti-VCA IgA precludes the utility of this 
antibody for NPC screening among Inuits'"^'. This is an 
example of how a complex disease like NPC can have a 
definable genetic origin and be carried as a genetic 
marker by a specific population yet the risk distinction, 
here between risk-originating Chinese and descendant 
Inuits, and between Arctic-at-risk Inuit and 
Siberian-not-at-risk Inuit, is not a simple story of genetic 
determination and geographic spread. The high risk of 
carcinoma of the nasopharynx and salivary glands 
observed in Arctic Inuit populations is maintained after 
migration to the low incidence area of Denmark, 
indicating that genetic factors acting early in life are 
etiologically important for these cancers'''^'. 

It needs to be remembered that genetic variations 
arise in a single individual. For the changes to become 
sufficiently established to account for an effect on a 
population, the operation of positive selection is required 
on the reproductive age group. The question is — what 
are the selective processes, and what are their genie 
targets and mechanisms? Whether origination occurs in 
a single location, as in the Wee unitary NPC origin 
hypothesis, or is geographically distributed, evolutionary 
time provides ample opportunity for the emergence of a 
multiplicity of genetic defects in different locations within 
the IVIHC and elsewhere in the genome. Recombination 
and other genomic rearrangements is a sufficient 
explanation for dissociation of linkage between candidate 
HLA markers and yet-to-be-discovered "causal" 
variants, together with balancing selection'*' so it should 
surphse no one that, although HLA-B*46:01 is not 
present in medium NPC occurrence populations such as 
Maori and Polynesian islanders, Taiwan Paiwan 
aborigines and Maghreb North Afhcans, HLA-B*46: 
01 -linked genetic lesions may still be found to be shared 
between disparate NPC risk populations. 

The major contributions of the recent GWAS were to 
highlight the well-established HLA associations, and to 
reveal three new genomic locations '^^'. Collectively, all 
these genes can contribute alleles towards useful 
diagnostics of disease risk and of early disease 
occurrence. In addition, the GWAS revealed genetic 
individuality even between individuals who were 
interval-HLA locus typed as homozygous over megabases 
because they differ in SNP haplotypes between 
canonical HLA locus allele types. Thus two individuals 
who share common HLA "haplotypes" such as 
A2-B46-DR9 and A33-B58-DR3, even as apparent 
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homozygotes, are dissimilar at intervening loci Tfie 
situation is not different from thiat at othier genetic loci 
where it is common practice to group separated SNPs at 
protein-coding sequences as "haplotypes". While it is 
widely assumed that phase-true "haplotypes" will be able 
to be assembled by ever more sophisticated 
bioinformatic algorithms, only phase-discrete analysis by 
separate chromosome sequencing between and through 
protein-coding DNA provides certainty of characterisation 
of the entire inherited diplotype. Thus definition of the 
genetic elements conferring risk for every disease will 
require resolution of the diploid genome as a di-haplome. 
Aside from the importance to classification of cis- and 
trans- phase in NPC genetic analysis, it also follows that 
even di-haploid matching for transplantation is 
incomplete unless the inter-HLA genotype is defined. 

Such a whole genome di-haploid typing strategy for 
"complete" autosomal archaeogenetic population 
profiling will also enable an interpretative tension to be 
held between the dichotomy of a phylogenetic 
arborescent framework and a reticulate form of modelling 
that is independent of complications arising from 
population selective pressures or of neutral drift. 

For the purpose of utilising DNA to study populations 
at differential disease risk, and here to testing of the 
unitary origin hypothesis of NPC risk among seemingly 
unrelated populations, it will soon become technically 
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